我的 Docker 卡死了,怎么办?在线等

您所在的位置:网站首页 linux ps 进程 hang 我的 Docker 卡死了,怎么办?在线等

我的 Docker 卡死了,怎么办?在线等

2023-09-02 04:53| 来源: 网络整理| 查看: 265

1. 背景

最近升级了一版 kubelet,修复因 kubelet 删除 Pod 慢导致平台删除集群超时的问题。在灰度 redis 隔离集群的时候,发现升级 kubelet 并重启服务后,少量宿主状态变成了 NotReady,并且回滚 kubelet 至之前版本,宿主状态仍然是 NotReady。查看宿主状态时提示 ‘container runtime is down’ ,根据经验,此时一般就是容器运行时出了问题。弹性云使用的容器运行时是 docker,我们就去检查 docker 的状态,检测结果如下:

docker ps 查看所有容器状态,执行正常

docker inspect 查看某一容器详细状态,执行阻塞

典型的 docker hang 死行为。因为我们最近在升级 docker 版本,存量宿主 docker 的版本为 1.13.1,并且在逐步升级至 18.06.3,新宿主的 docker 版本都是 18.06.3。docker hang 死问题在 1.13.1 版本上表现得更彻底,在执行 docker ps 的时候就已经 hang 死了,一旦某个容器出了问题,docker 就处于无响应状态;而 docker 18.06.3 做了一点小小的优化,在执行 docker ps 时去掉了针对容器级别的加锁操作,但是 docker inspect 依然会加容器锁,因此某一个容器出现问题,并不会造成 docker 服务不可响应,受影响的也仅仅是该容器,无法执行任何操作。

至于为什么以 docker ps 与 docker inspect 为指标检查 docker 状态,因为 kubelet 就是依赖这两个 docker API 获取容器状态。

所以,现在问题有二:

docker hang 死的根因是什么?

docker hang 死时,为什么重启 kubelet,会导致宿主状态变为 NotReady?

2. 重启 kubelet 变更宿主状态

kubelet 重启后宿主状态从 Ready 变为 NotReady,这个问题相较 docker hang 死而言,没有那么复杂,所以我们先排查这个问题。

kubelet 针对宿主会设置多个 Condition,表明宿主当前所处的状态,比如宿主内存是否告急、线程数是否告急,以及宿主是否就绪。其中 ReadyCondition 表明宿主是否就绪,kubectl 查看宿主状态时,展示的 Statue 信息就是 ReadCondition 的内容,常见的状态及其含义定义如下:

Ready 状态:表明当前宿主状态一切 OK,能正常响应 Pod 事件

NotReady 状态:表明宿主的 kubelet 仍在运行,但是此时已经无法处理 Pod 事件。NotReady 绝大多数情况都是容器运行时出了问题

Unknown 状态:表明宿主 kubelet 已停止运行

kubelet 定义的 ReadyCondition 的判定条件如下:

// defaultNodeStatusFuncs is a factory that generates the default set of // setNodeStatus funcs func (kl *Kubelet) defaultNodeStatusFuncs() []func(*v1.Node) error {    ......    setters = append(setters,       nodestatus.OutOfDiskCondition(kl.clock.Now, kl.recordNodeStatusEvent),       nodestatus.MemoryPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderMemoryPressure, kl.recordNodeStatusEvent),       nodestatus.DiskPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderDiskPressure, kl.recordNodeStatusEvent),       nodestatus.PIDPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderPIDPressure, kl.recordNodeStatusEvent),       nodestatus.ReadyCondition(kl.clock.Now, kl.runtimeState.runtimeErrors, kl.runtimeState.networkErrors, validateHostFunc, kl.containerManager.Status, kl.recordNodeStatusEvent),       nodestatus.VolumesInUse(kl.volumeManager.ReconcilerStatesHasBeenSynced, kl.volumeManager.GetVolumesInUse),       // TODO(mtaufen): I decided not to move this setter for now, since all it does is send an event       // and record state back to the Kubelet runtime object. In the future, I'd like to isolate       // these side-effects by decoupling the decisions to send events and partial status recording       // from the Node setters.       kl.recordNodeSchedulableEvent,    )    return setters }

深入 nodestatus.ReadyCondition 的实现可以发现,宿主是否 Ready 取决于很多条件,包含运行时判定、网络判定、基本资源判定等。这里我们只需关注运行时判定即可:

func (s *runtimeState) runtimeErrors() []string {    s.RLock()    defer s.RUnlock()    var ret []string    if !s.lastBaseRuntimeSync.Add(s.baseRuntimeSyncThreshold).After(time.Now()) {  // 1       ret = append(ret, "container runtime is down")    }    if s.internalError != nil {       ret = append(ret, s.internalError.Error())    }    for _, hc := range s.healthChecks {                                            // 2       if ok, err := hc.fn(); !ok {          ret = append(ret, fmt.Sprintf("%s is not healthy: %v", hc.name, err))       }    }    return ret }

当出现如下两种状况之一时,则判定运行时检查不通过:

距最近一次运行时同步操作的时间间隔超过指定阈值(默认 30s)

运行时健康检查未通过

那么,当时宿主的 NotReady 是由哪种状况引起的呢?结合 kubelet 日志分析,kubelet 每隔 5s 就输出一条日志:

...... I0715 10:43:28.049240   16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] I0715 10:43:33.049359   16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] I0715 10:43:38.049492   16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] ......

因此,状况 1 是宿主 NotReady 的元凶。

我们继续分析为什么 kubelet 没有按照预期设置 lastBaseRuntimeSync。kubelet 启动时会创建一个 goroutine,并在该 goroutine 中循环设置 lastBaseRuntimeSync,循环如下:

func (kl *Kubelet) Run(updates  kl.cadvisor.Start -> cc.Manager.Start -> self.createContainer -> m.createContainerLocked -> container.NewContainerHandler -> factory.CanHandleAndAccept -> self.client.ContainerInspect

由于某个容器状态异常,kubelet 执行 docker inspect 操作也被 hang 死。

因此,重启 kubelet 引起宿主状态从 Ready 变为 NotReady,其根因在于某个容器状态异常,执行 docker inspect 时被 hang 死。而如果 docker inspect hang 死发生在 kubelet 重启之后,则不会对宿主的 Ready 状态造成任何影响,因为 oneTimeInitializer 是 sync.Once 类型,也即仅仅会在 kebelet 启动时执行一次。那时 kubelet 仅仅是不能处理该 Pod 相关的任何事件,包含删除、变更等,但是仍然能够处理其他 Pod 的任意事件。

可能有人会问,为什么 kubelet 重启时访问 docker inspect 操作不加超时控制?确实,如果添加了超时控制,kubelet 重启不会引起宿主状态变更。待详细挖掘后再来补充,我们先继续分析 docker hang 死的问题。

3. docker hang 死

我们对 docker hang 死并不陌生,因为已经发生了好多起。其发生时的现象也多种多样。以往针对 docker 1.13.1 版本的排查都发现了一些线索,但是并没有定位到根因,最终绝大多数也是通过重启 docker 解决。而这一次发生在 docker 18.06.3 版本的 docker hang 死行为,经过我们 4 人小分队接近一周的望闻问切,终于确定了其病因。注意,docker hang 死的原因不止一种,因此本处方并非是个万能药。

现在,我们掌握的知识仅仅是 docker 异常了,无法响应特定容器的 docker inspect 操作,而对详细信息则一无所知。

链路跟踪

首先,我们希望对 docker 运行的全局状况有一个大致的了解,熟悉 go 语言开发的用户自然能联想到神器 pprof。我们借助 pprof 描绘出了 docker 当时运行的蓝图:

goroutine profile: total 722373 717594 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2132ef 0x7fe8bc238c1a 0x7fe8bd56f7fe 0x7fe8bd56f6bd 0x7fe8bcea8719 0x7fe8bcea938b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2132ee sync.runtime_SemacquireMutex+0x3e                /usr/local/go/src/runtime/sema.go:71 # 0x7fe8bc238c19 sync.(*Mutex).Lock+0x109                 /usr/local/go/src/sync/mutex.go:134 # 0x7fe8bd56f7fd github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+0x8d            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:40 # 0x7fe8bd56f6bc github.com/docker/docker/daemon.(*Daemon).ContainerInspect+0x11c            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:29 # 0x7fe8bcea8718 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+0x118        /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.go:15 # 0x7fe8bcea938a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:39 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9         /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400         /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa          /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199           /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45                /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227          /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd                /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652                 /usr/local/go/src/net/http/server.go:1830 4175 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2132ef 0x7fe8bc238c1a 0x7fe8bcc2eccf 0x7fe8bd597af4 0x7fe8bcea2456 0x7fe8bcea956b 0x7fe8bcb73dff 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2132ee sync.runtime_SemacquireMutex+0x3e                /usr/local/go/src/runtime/sema.go:71 # 0x7fe8bc238c19 sync.(*Mutex).Lock+0x109                 /usr/local/go/src/sync/mutex.go:134 # 0x7fe8bcc2ecce github.com/docker/docker/container.(*State).IsRunning+0x2e             /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/state.go:240 # 0x7fe8bd597af3 github.com/docker/docker/daemon.(*Daemon).ContainerStats+0xb3             /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/stats.go:30 # 0x7fe8bcea2455 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersStats+0x1e5        /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container_routes.go:115 # 0x7fe8bcea956a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersStats)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:42 # 0x7fe8bcb73dfe github.com/docker/docker/api/server/router.cancellableHandler.func1+0xce           /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/local.go:92 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9         /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400         /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa          /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199           /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45                /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227          /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd                /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652                 /usr/local/go/src/net/http/server.go:1830 1 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2131fb 0x7fe8bc239a3b 0x7fe8bcbb679d 0x7fe8bcc26774 0x7fe8bd570b20 0x7fe8bd56f81c 0x7fe8bd56f6bd 0x7fe8bcea8719 0x7fe8bcea938b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2131fa sync.runtime_Semacquire+0x3a                 /usr/local/go/src/runtime/sema.go:56 # 0x7fe8bc239a3a sync.(*RWMutex).RLock+0x4a                 /usr/local/go/src/sync/rwmutex.go:50 # 0x7fe8bcbb679c github.com/docker/docker/daemon/exec.(*Store).List+0x4c              /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec/exec.go:140 # 0x7fe8bcc26773 github.com/docker/docker/container.(*Container).GetExecIDs+0x33             /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/container.go:423 # 0x7fe8bd570b1f github.com/docker/docker/daemon.(*Daemon).getInspectData+0x5cf             /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:178 # 0x7fe8bd56f81b github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+0xab            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:42 # 0x7fe8bd56f6bc github.com/docker/docker/daemon.(*Daemon).ContainerInspect+0x11c            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:29 # 0x7fe8bcea8718 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+0x118        /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.go:15 # 0x7fe8bcea938a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:39 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9         /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400         /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa          /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199           /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45                /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227          /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd                /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652                 /usr/local/go/src/net/http/server.go:1830 1 @ 0x7fe8bc202980 0x7fe8bc212946 0x7fe8bc8b6881 0x7fe8bc8b699d 0x7fe8bc8e259b 0x7fe8bc8e1695 0x7fe8bc8c47d5 0x7fe8bd2e0c06 0x7fe8bd2eda96 0x7fe8bc8c42fb 0x7fe8bc8c4613 0x7fe8bd2a6474 0x7fe8bd2e6976 0x7fe8bd3661c5 0x7fe8bd56842f 0x7fe8bcea7bdb 0x7fe8bcea9f6b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc8b6880 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader+0x100           /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 # 0x7fe8bc8b699c github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress+0x2c           /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 # 0x7fe8bc8e259a github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg+0x63a            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 # 0x7fe8bc8e1694 github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg+0x44            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 # 0x7fe8bc8c47d4 github.com/docker/docker/vendor/google.golang.org/grpc.invoke+0x184              /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 # 0x7fe8bd2e0c05 github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary+0xf5          /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 # 0x7fe8bd2eda95 github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm+0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 # 0x7fe8bc8c42fa github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke+0x10a            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 # 0x7fe8bc8c4612 github.com/docker/docker/vendor/google.golang.org/grpc.Invoke+0xc2              /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 # 0x7fe8bd2a6473 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Start+0xd3        /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:421 # 0x7fe8bd2e6975 github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Start+0xf5            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:109 # 0x7fe8bd3661c4 github.com/docker/docker/libcontainerd.(*client).Exec+0x4b4               /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:381 # 0x7fe8bd56842e github.com/docker/docker/daemon.(*Daemon).ContainerExecStart+0xb4e              /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.go:251 # 0x7fe8bcea7bda github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart+0x34a          /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.go:125 # 0x7fe8bcea9f6a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm+0x6a   /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:59 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+0xd9           /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+0x400           /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+0x7aa            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+0x199             /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP+0x45                  /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0x227            /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+0x71              /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP+0xbd                  /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve+0x652                   /usr/local/go/src/net/http/server.go:1830

注意,这是一份精简后的 docker 协程栈信息。从上面的蓝图,我们可以总结出如下结论:

有 717594 个协程被阻塞在 docker inspect

有 4175 个协程被阻塞在 docker stats

有 1 个协程被阻塞在获取 docker exec 的任务 ID

有 1 个协程被阻塞在 docker exec 的执行过程

从上面的结论,我们基本了解了异常容器 hang 死的原因,在于该容器执行 docker exec 后未返回 (4),进而导致获取 docker exec 的任务 ID 阻塞(3),由于(3) 实现获取了容器锁,进而导致了 docker inspect (1)与 docker stats (2) 卡死。所以病因并非是 docker inspect,而是 docker exec。

要想继续往下挖掘,我们现在有必要补充一下背景知识。kubelet 启动容器或者在容器内执行命令的完整调用路径如下:

+--------------------------------------------------------------+ |                                                              | |   +------------+                                             | |   |            |                                             | |   |   kubelet  |                                             | |   |            |                                             | |   +------|-----+                                             | |          |                                                   | |          |                                                   | |   +------v-----+       +---------------+                     | |   |            |       |               |                     | |   |   dockerd  ------->|  containerd   |                     | |   |            |       |               |                     | |   +------------+       +-------|-------+                     | |                                |                             | |                                |                             | |                        +-------v-------+     +-----------+   | |                        |               |     |           |   | |                        |containerd-shim----->|   runc    |   | |                        |               |     |           |   | |                        +---------------+     +-----------+   | |                                                              | +--------------------------------------------------------------+

dockerd 与 containerd 可以当做两层 nginx 代理,containerd-shim 是容器的监护人,而 runc 则是容器启动与命令执行的真正工具人。runc 干的事情也非常简单:按照用户指定的配置创建 NS,或者进入特定 NS,然后执行用户命令。说白了,创建容器就是新建 NS,然后在该 NS 内执行用户指定的命令。

按照上面介绍的背景知识,我们继续往下探索 containerd。幸运的是,借助 pprof,我们也可以描绘出 containerd 此时的运行蓝图:

goroutine profile: total 430 1 @ 0x7f6e55f82740 0x7f6e55f92616 0x7f6e56a8412c 0x7f6e56a83d6d 0x7f6e56a911bf 0x7f6e56ac6e3b 0x7f6e565093de 0x7f6e5650dd3b 0x7f6e5650392b 0x7f6e56b51216 0x7f6e564e5909 0x7f6e563ec76a 0x7f6e563f000a 0x7f6e563f6791 0x7f6e55fb0151 # 0x7f6e56a8412b github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch+0x24b    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102 # 0x7f6e56a83d6c github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call+0x15c     /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:73 # 0x7f6e56a911be github.com/containerd/containerd/linux/shim/v1.(*shimClient).Start+0xbe       /go/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1745 # 0x7f6e56ac6e3a github.com/containerd/containerd/linux.(*Process).Start+0x8a        /go/src/github.com/containerd/containerd/linux/process.go:125 # 0x7f6e565093dd github.com/containerd/containerd/services/tasks.(*local).Start+0x14d       /go/src/github.com/containerd/containerd/services/tasks/local.go:187 # 0x7f6e5650dd3a github.com/containerd/containerd/services/tasks.(*service).Start+0x6a       /go/src/github.com/containerd/containerd/services/tasks/service.go:72 # 0x7f6e5650392a github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler.func1+0x8a     /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:624 # 0x7f6e56b51215 github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor+0xa5 /go/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 # 0x7f6e564e5908 github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler+0x168     /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:626 # 0x7f6e563ec769 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC+0x849    /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:920 # 0x7f6e563f0009 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream+0x1319    /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1142 # 0x7f6e563f6790 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1+0xa0   /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:637

同样,我们仅保留了关键的协程信息,从上面的协程栈可以看出,containerd 阻塞在接收 exec 返回结果处,附上关键代码佐证:

func (c *Client) dispatch(ctx context.Context, req *Request, resp *Response) error {    errs := make(chan error, 1)    call := &callRequest{       req:  req,       resp: resp,       errs: errs,    }    select {    case c.calls 


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3